Goto

Collaborating Authors

 assembly phase


A dynamic state-based model of crowds

arXiv.org Artificial Intelligence

As a discipline, crowd science has acknowledged the need to understand the nature of human collective phenomena before trying to explain them, and a number of attempts have been made to specify and classify different crowd types and behaviours. However, these typologies are often partial, over-fitted to a specific crowd type, or use arbitrary and/or subjective labels for behaviours of complex origin (for example, "panic"). Moreover, they tend to be relatively inflexible, and do not reflect the fluid nature of crowd behaviour (and how this might influence the crowd's structure and impact over time). For example, a static typology might not capture a situation in which a peaceful demonstration can quickly turn into a riot, or how a physical crowd moving around a shopping mall can suddenly become united into a psychological crowd in response to a shared grievance or an external threat. In this paper, we present an alternative to the typology approach; a dynamic, state-based model of crowds, structured around an existing assembly-action-dispersal framework. Our model draws on the statechart formalism from computer science. This approach is relatively objective, can capture the dynamic evolution of a crowd over time, and (unlike existing typologies, which are relatively static) allows for the natural description of how sub-groups emerge within a crowd. This new model may be useful for describing the evolution of incidents such as riots or emergencies, but it is equally well-suited to the study of expected, "normal" crowds.


ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments

arXiv.org Artificial Intelligence

For embodied agents, navigation is an important ability but not an isolated goal. Agents are also expected to perform specific tasks after reaching the target location, such as picking up objects and assembling them into a particular arrangement. We combine Vision-and-Language Navigation, assembling of collected objects, and object referring expression comprehension, to create a novel joint navigation-and-assembly task, named ArraMon. During this task, the agent (similar to a PokeMON GO player) is asked to find and collect different target objects one-by-one by navigating based on natural language instructions in a complex, realistic outdoor environment, but then also ARRAnge the collected objects part-by-part in an egocentric grid-layout environment. To support this task, we implement a 3D dynamic environment simulator and collect a dataset (in English; and also extended to Hindi) with human-written navigation and assembling instructions, and the corresponding ground truth trajectories. We also filter the collected instructions via a verification stage, leading to a total of 7.7K task instances (30.8K instructions and paths). We present results for several baseline models (integrated and biased) and metrics (nDTW, CTC, rPOD, and PTC), and the large model-human performance gap demonstrates that our task is challenging and presents a wide scope for future work. Our dataset, simulator, and code are publicly available at: https://arramonunc.github.io